Use of Lexicon Density in Evaluating Word Recognizers

نویسندگان

  • Petr Slavík
  • Venu Govindaraju
چکیده

We have developed the notion of lexicon density as a metric to measure the expected accuracy of handwritten word recognizers. Thus far, researchers have used the size of the lexicon as a gauge for the difficulty of the handwritten word recognition task. For example, the literature mentions recognizers with accuracy for lexicons of sizes 10, 100, 1000, and so forth, implying that the difficulty of the task increases (and hence recognition accuracy decreases) with increasing lexicon sizes across recognizers. Lexicon density is an alternate measure which is quite dependent on the recognizer. There are many applications such as address interpretation where such a recognizer dependent measure can be useful. We have conducted experiments with two different types of recognizers. A segmentation-based and a grapheme-based recognizer have been selected to show how the measure of lexicon density can be developed in general for any recognizer. Experimental results show that the lexicon density measure described is more suitable than lexicon size or a simple string edit distance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Dependence of Handwritten Word Recognizers on Lexicons

The performance of any word recognizer depends on the lexicon presented. Usually, large lexicons or lexicons containing similar entries pose difficulty for recognizers. However, the literature lacks any quantitative methodology of capturing the precise dependence between word recognizers and lexicons. This paper presents a performance model that views word recognition as a function of character...

متن کامل

Probabilistic Model for Segmentation Based Word Recognition with Lexicon

The problem of off-line reading of unconstrained handwritten words has been studied extensively due to its role in many important applications such as reading addresses on mail-pieces [3, 6, 11], reading amounts on bank checks [7, 10], extracting census data on forms [2, 9], and reading address blocks on tax forms [12]. The main challenges are wide variety of writing styles, poor image quality ...

متن کامل

A Format-Driven Handwritten Word Recognition System

A format-driven word recognition system is proposed for recognition of handwritten words. Unlike most traditional handwritten word recognizers being given a set of target words as lexicon, we assume that our system is given a set of format descriptions other than lexicon words. Applications of the proposed system include recognition of relatively more important keywords such as postal codes, ti...

متن کامل

Learning new word pronunciations from spoken examples

A lexicon containing explicit mappings between words and pronunciations is an integral part of most automatic speech recognizers (ASRs). While many ASR components can be trained or adapted using data, the lexicon is one of the few that typically remains static until experts make manual changes. This work takes a step towards alleviating the need for manual intervention by integrating a popular ...

متن کامل

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000